[人人能懂] 从高效分工、拥抱不确定到自我复盘
Description
我们总觉得AI越大越好,但如果一个AI能像大公司一样知识渊博,却只用一个小团队的成本来思考,是不是更酷?本期节目,我们就从几篇最新论文出发,看看AI如何学会当一个聪明的“调度员”,如何像学徒一样承认“不确定性”来学得更快,甚至如何通过“复盘”和“划重点”来真正实现“吃一堑、长一智”。准备好,一起探索AI更聪明、更高效的进化之路吧!
00:00:33 AI大模型的小秘密:如何用一个“小团队”,干翻一个“大公司”?
00:05:55 聪明的“笨功夫”:如何让机器人学得更快?
00:12:08 让AI学会“吃一堑、长一智”,需要几步?
00:17:27 AI的“七秒记忆”难题,如何用“划重点”来解决?
00:23:06 机器人学徒:如何从“笨拙模仿”到“青出于蓝”?
本文介绍的几篇论文:
[CL] Sigma-Moe-Tiny Technical Report
[Microsoft Research]
https://arxiv.org/abs/2512.16248
---
[LG] Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning
[UC Berkeley & Stanford]
https://arxiv.org/abs/2512.16911
---
[LG] Meta-RL Induces Exploration in Language Agents
[EPFL & Idiap Research Institute]
https://arxiv.org/abs/2512.16848
---
[LG] Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference
[Microsoft Research India]
https://arxiv.org/abs/2512.16391
---
[RO] ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning
[University of Toronto & Georgia Institute of Technology & NVIDIA Research]
https://arxiv.org/abs/2512.16861



